A Trainable Visually-grounded Spoken Language Generation System

نویسنده

Deb Roy

چکیده

A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell’ procedure in which visual scenes are paired with natural language descriptions. Learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using these structures, a planning algorithm integrates syntactic, semantic, and contextual constraints to generate natural and unambiguous descriptions of objects in novel scenes. The output of the generation system is synthesized using word-based concatenative synthesis drawing from the original training speech corpus. In evaluations of semantic comprehension by human judges, the performance of automatically generated spoken descriptions was comparable to human generated descriptions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A trainable spoken language understanding system for visual object selection

We present a trainable, visually-grounded, spoken language understanding system. The system acquires a grammar and vocabulary from a “show-and-tell” procedure in which visual scenes are paired with verbal descriptions. The system is embodied in a table-top mounted active vision platform. During training, a set of objects is placed in front of the vision system. Using a laser pointer, the system...

متن کامل

Grounding Natural Spoken Language Semantics in Visual Perception and Motor Control

A characteristic shared by most approaches to natural language understanding and generation is the use of symbolic representations of word and sentence meanings. Frames and semantic nets are two popular current approaches. Symbolic methods alone are inadequate for applications such as conversational robotics that require natural language semantics to be linked to perception and motor control. T...

متن کامل

Evaluating a Trainable Sentence Planner for a Spoken Dialogue System

Techniques for automatically training modules of a natural language generator have recently been proposed, but a fundamental concern is whether the quality of utterances produced with trainable components can compete with hand-crafted template-based or rulebased approaches. In this paper We experimentally evaluate a trainable sentence planner for a spoken dialogue system by eliciting subjective...

متن کامل

Learning visually grounded words and syntax for a scene description task

A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell" procedure in which visual scenes are paired with natural language descriptions. Learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using ...

متن کامل

A trainable generator for recommendations in multimodal dialog

As the complexity of spoken dialogue systems has increased, there has been increasing interest spoken language generation (SLG). SLG promises portability across application domains and dialogue situations through the development of applicationindependent linguistic modules. However in practice, rulebased SLGs often have to be tuned to the application. Recently, a number of research groups have ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

A Trainable Visually-grounded Spoken Language Generation System

نویسنده

چکیده

منابع مشابه

A trainable spoken language understanding system for visual object selection

Grounding Natural Spoken Language Semantics in Visual Perception and Motor Control

Evaluating a Trainable Sentence Planner for a Spoken Dialogue System

Learning visually grounded words and syntax for a scene description task

A trainable generator for recommendations in multimodal dialog

عنوان ژورنال:

اشتراک گذاری